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Abstract — In the last decade, a new computational paradigm 
was introduced in the field of Machine Learning, under the name 
of Reservoir Computing (RC). RC models are neural networks 
which a recurrent part (the reservoir) that does not participate 
in the learning process, and the rest of the system where no 
recurrence (no neural circuit) occurs. This approach has grown 
rapidly due to its success in solving learning tasks and other 
computational applications. Some success was also observed 
with another recently proposed neural network designed using 
Queueing Theory, the Random Neural Network (RandNN). Both 
approaches have good properties and identified drawbacks. 

In this paper, we propose a new RC model called Echo 
State Queueing Network (ESQN), where we use ideas coming 
from RandNNs for the design of the reservoir. ESQNs consist 
in ESNs where the reservoir has a new dynamics inspired by 
recurrent RandNNs. The paper positions ESQNs in the global 
Machine Learning area, and provides examples of their use and 
performances. We show on largely used benchmarks that ESQNs 
are very accurate tools, and we illustrate how they compare with 
standard ESNs. 

Index Terms - Reservoir Computing, Echo State Network, 
Random Neural Network, Queueing Network, Machine Learning 

I. Introduction 

Artificial Neural Networks (ANNs) are a class of compu- 
tational models which have been proven to be very powerful 
as statistical learning tools to solve complicated engineering 
tasks as well as many theoretical issues. Several types of 
ANNs have been designed, some of them originating in 
the field of Machine Learning while others coming from 
biophysics and neuroscience. The Random Neural Network 
(RandNN) proposed by E. Gelenbe in 1989 jT|, is a mathe- 
matical object inspired by biological neuronal behavior which 
merges features of Spiking Neural Networks and Queueing 
Networks. In the literature, actually two different interpreta- 
tions of exactly the same mathematical model are proposed. 
One is a type of spiking neuron and the associated network 
which is called RandNNs. The other one is a new type of 
queue and networks of queues, respectively called G-queues 
and G-networks. The RandNN is a connectionist model where 
spikes circulate among the interconnected neurons. A discrete 
state-space is used to represent the internal state (potential) of 
each neuron. The firing times of the spikes are modeled as 
Poisson processes. The potential of each neuron is represented 
by a positive integer that increases when a spike arrives or 
decreases after the neuron fires. In order to use RandNNs in 



supervised learning problems, a gradient descent algorithm 
has been described in [2], and Quasi-Newton methods have 
been proposed in |3), Q. Additionally, the function approxi- 
mation properties of the model were studied in (5J, |6j. The 
structure of the model leads to efficient numerical evaluation 
procedures, to good performance in learning algorithms and 
to easy hardware implementations. Consequently, since its 
introduction the model has been applied in a variety of 
scientific fields. Nevertheless, the RandNNs model suffers 
from limitations. Some of them are related to the use of 
a feedforward topology (see (7)). The original acronym to 
refer the model was RNN. In this work to avoid a conflict 
of notation, we use RandNN for Random Neural Networks, 
due to the use of RNNs in Machine Learning literature for 
Recurrent Neural Networks. 

Concerning models with recurrences (circuits) in their 
topologies, they are recognized as powerful tools for a number 
of tasks in Machine Learning (both traditional ANNs and 
RandNNs). However, they have a main limitation which 
comes from the difficulty in implementing efficient training 
algorithms. The main drawbacks related to learning algorithms 
are the following: convergence is not always guaranteed, many 
algorithmic parameters are involved, sometimes long training 
times are required |K|. [9|. For all those reasons learning using 
recurrent neural networks is principally feasible for relatively 
small networks. 

Recently, a new paradigm called Reservoir Computing 
(RC) has been developed which overcome the main draw- 
backs of learning algorithms applied to networks with cyclic 
topologies. About ten years ago two main RC models were 
proposed: Echo State Networks (ESNs) flO) and Liquid State 
Machines (LSMs) [11 1. Both models describe the possibility 
of using recurrent neural networks without adapting the weight 
connections involved in recurrences. The network outputs are 
generated using very simple learning methods such as clas- 
sification or regression models. The RC approach have been 
successfully applied in many machine learning tasks achieving 



goods results, specially in temporal learning tasks |8j, |11|, 

In this paper we introduce a new type of RC method 
which uses some ideas from RandNNs. The paper is organized 
as follows: we begin by describing the RandNN model in 
Section [II] In Section III, we introduce the two funding 
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RC models. Section [TV] discusses the contribution of this 
paper, a new RC model similar to the ESN, but using also 
ideas inspired by queuing theory. Finally, we present some 
experimental results and we end with some conclusions as 
well as a discussion regarding future lines of research. 

II. Description of the 
Random Neural Network Model 

A Random Neural Network (RandNN) is a specific queuing 
network proposed in JT] which merges concepts from spik- 
ing neural networks and queuing theory. Depending on the 
context, its nodes are seen as queues or as spiking neurons. 
Each of these neurons receives spikes (pulses) from outside, 
which are of one out of two disjoint types, called excitatory 
(or positive) and inhibitory (or negative). Associated with 
a neuron there is an integer variable called the neuron's 
potential. Each time a neuron receives an excitatory spike, its 
potential is increased by one. If a neuron receives an inhibitory 
spike and its potential was strictly positive, it decreases by 
one; if it was equal to 0, it remains at that value. As far as 
the neuron's potential is strictly positive, the neuron sends 
excitatory spikes to outside. When the neuron's potential is 
strictly positive, we say that the neuron is excited or active. 
After numbering the neurons in an arbitrary order, let's denote 
by S u (t) the potential of neuron u at time t. During the pe- 
riods when the neuron is active, it produces excitatory spikes 
with some rate r u > 0. In other words, the output process 
of the pulses coming out of an active neuron is a Poisson 
process. Then, a spike produced by neuron u is transferred 
to the environment with probability d u . For each synapse 
between neuron u and v an excitatory spike (respectively 
inhibitory spike) produced by u is switched to neuron v with 
probability p+ u (respectively p^ u ). In the literature related to 
RandNNs, the probability that a pulse generated at neuron u 
goes to neuron v is usually denoted by pt/ v . This is different 
from the notation used in the standard ANNs literature, where 
a direct connection between u and v is often denoted as (v, u), 
that is, in the reverse order. In this paper we follow the latter 
notation. This routing procedure is performed independently 
of anything else happening in the network, including previous 
or future switches at the same neuron or at any other one. 
Observe that for any neuron u we have 

N N 
v=l v=l 

where N is the number of neurons in the network. The weight 
connection between any two neurons u and v (u sending 
spikes to v) is defined as: w+ u = r u p+ u and w~ u = r u p~ u . 

Let us assume that the external (i.e. from the environment) 
arrival process of positive (respectively negative) spikes to 
neuron u is Poisson with rate A+ (respectively with rate A~). 
Some of these rates can be 0, meaning that no spike of the 
considered type arrives at the given neuron coming from the 
network's environment. In order to avoid the trivial case where 
nothing happens, we also must assume that Ylu=i > 



(otherwise, the network is composed of neurons that are in- 
active at all times). Last, the usual independence assumptions 
between all the considered Poisson and routing processes in 
the model are assumed. 

We call S(t) = (Si{t), ■ ■ ■ , S N (t)) the state of the network 
at time t. Observe that S is a continuous time Markov process 
over the state space N w . We will assume that S is irreducible 
and ergodic. We are interested in the network's behavior in 
steady-state, so, let us assume that S is in equilibrium (that 
is, assume S is stationary). Let g u be the probability (in 
equilibrium) that neuron u is excited, 

q u = lim P(S u (t) > 0). 

t— f oo 

This parameter is called the activity rate of neuron u. Since 
process S is ergodic, for all neron u we have < g u < 1. 
Gelenbe in (T], fl3) shows that in an equilibrium situation 
the g u s satisfy the following non-linear system of equations: 

T+ 

for each node u, g u = — — , (1) 

r u + T u 

N 

for each node u, T+ = A+ + g v w^ v , (2) 

v=l 
N 

for each node u, T~ = \~ + >J Q v w~ v , (3) 

with the supplementary condition that, for all neuron u, we 
have g u < 1, In other words, under the assumption of 
irreducibility, if the system of equations ([TJ, |2]) and ([3]l has 
a solution (gi, - ■ ■ , qn) such that we have g u < 1, for all 
neuron u, then the solution is unique and the Markov process 
is ergodic. Moreover, its stationary distribution is given by the 
product of the marginal probabilities of the neuron's potential. 
For more details and proofs, see |TJ, J5j- 

As a learning tool used to learn some unknown function, we 
map the function's variables to the external arrival rates, the 
A+s and A~s numbers (however, usually we set set A,7 = 
for all input neuron u, so we map the function's variables to 
the Ajs only). The network's output is the set of loads. The 
learning parameters are the set of weights in the model. An 
appropriate optimization method (such as Gradient Descent) 
is used to find weights such that when the arrival rate (of 
positive spikes) equals the input data, the network output 
matches (with small error) the corresponding known output 
data values. The model has been widely used in fields such 
as: combinatorial optimization, machine learning problems, 
communication networks and computer systems |l4)-|[T8j. 

III. Reservoir computing methods 

Recurrent Neural Networks are a large class of com- 
putational models used in several applications of Machine 
Learning and in neurosciences. The main characteristic of this 
type of ANNs is the existence of at least one feedback loop 
among the connections, that is, a circuit of connections. The 
cyclic topology causes that the non-linear transformation of 
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the input history can be stored in internal states. Hence 
recurrent neural networks are a powerful tool for forecasting 
and time series processing applications. They are also very 
useful for building associative memories, in data compression 
and for static pattern classification (8). However, in spite of 
these important abilities and of the fact that we have efficient 
algorithms for training neural networks without recurrences, 
no efficient algorithms exist for the case where recurrences 
are present. 

Since the early 2000s, Reservoir Computing has gained 
prominence in the ANN community. In the two basic forms 
of the model described before, ESNs and LSMs, at least three 
well-differenced structures can be identified: the input layer, 
where neurons receive information from the environment; 
the reservoir (in ESNs) or liquid (in LSMs), a nonlinear 
"expansion" function implemented using a recurrent neuronal 
network; the readout, which is usually a linear function or 
a neural network without recurrences, producing the desired 
output. 

The weight connections among neurons in the reservoir and 
the connections between input and reservoir neurons are fixed 
during the learning process, only the weights between input 
neurons and readout units, and between reservoir and readout 
units, are the object of the training process. The reservoir 
with its recurrences or circuits, allows a kind of "expansion" 
of the input and possibly of history data into a larger space. 
From this point of view, the reservoir idea is similar to the 
expansion function used in Kernel Methods, for example in 
the Support Vector Machine [19|. The projection can enhance 
the linear separability of the data | 12 1. On the other hand, the 
readout layer is built to be performant in learning, specially to 
be robust and fast in this process. The RC approach is based 
on the empirical observation that under certain assumptions, 
training only a linear readout is often sufficient to achieve 
good performance in many learning tasks (8). For instance, 
the ESN model has the best known learning performance on 
the Mackey-Glass times series prediction task (20) , J2"T) . 

The topology of a RC model consists of an input layer 
with N a units sending pulses to the reservoir (and possibly 
also to the readout), a recurrent neural network with N x units, 
where N a <C N x , and a layer with iVj, readout neurons having 
adjustable connections from the reservoir (and possibly from 
the input) layer(s). 

The main difference between LSMs and ESNs consists in 
the type of nodes included in the reservoir. In the original 
LSM model the liquid was built using a model derived from 
Hodgkin-Huxley's work, called Leaky Integrate and Fire (LFI) 
neurons. In the standard ESN model, the activation function of 
the units is most often tanh(-). An ESN is basically a three- 
layered NN where only the hidden layer has recurrences, but 
allowing connections from input to readout (and, again, where 
learning is concentrated in the readout only). Our training 
data consists of K pairs (a'*', b^'), k = 1, . . . , K, of input- 
output values of some unknown function /, where a( fc ) € 
R N % bW € R Nh and b^ = /(a^). The weights matrices 



are w m (connections from input to reservoir), w 1 (connections 
inside the reservoir) and w out (connections between input or 
reservoir and readout), of dimensions N x x (1 + N a ), N x x N x 
and x (1 + N a + N x ), respectively. The first rows of w ln 
and w out contain ones corresponding to the bias terms. 

Each neuron j of the reservoir has a real state Xj, When 
the input a arrives to the ESN, the reservoir first updates its 
state x = (sci, . . . , £at x ) by executing 

x := tanh(w in [l;a] + w r x), (4) 

and then, the ESN computes its outputs 

y :=w° ut [l;a;x], (5) 

where [•; •] is the vertical vector concatenation. 

If we think of the ESN has a dynamical system receiving 
a time series of inputs a(l),a(2), . . . and producing a series 
of outputs y(l), y(2), . . ., the corresponding series of state 
values evolves according to 

x(£) =tanh (w in [l;a(t)]+w r x(£-l)), 

with the output at t computed by y(£) := w out [l; a(£); x(£)]. 

To ensure good properties in the reservoir, the w r ma- 
trix is usually scaled to control its spectral radius (to have 
p(w r ) < 1) [ 10 1 . The role of the spectral radius is more 
complex when the reservoir is built with spiking neurons (in 
the LSM model) (12), (22). 

Several extensions of the two pioneering RC models 
have been suggested in the literature, such as: intrinsic 
plasticity [23 1, backpropagation-decorrelation (24) , decoupled 
ESN (25), leaky integrator (26), Evolino (20), etc. 

IV. A new Reservoir Computing method: 
Echo State Queuing Networks 

In this paper, we propose to reach the objective of si- 
multaneously keeping the good properties of the two models 
previously described. For this purpose, we introduce the Echo 
State Queuing Network (ESQN), a new RC model where the 
reservoir dynamics is based on a specific type of queuing 
network (RandNN) behavior in steady-state. 

The architecture of an ESQN consists of an input layer, a 
reservoir and a readout layer. The input layer is composed 
of N a random neural units which send spikes toward the 
reservoir or toward the readout nodes. The reservoir dynamics 
is designed inspired by the equations of recurrent RandNNs 
(see below). Let us index the input neurons from 1 and N a , 
and the reservoir neurons from A a + 1 to A a + N x . 

When an input a is offered to the network, we first identify 
the rates of the external positive spikes with that input, that 
is: A+ = a u , and, as it is traditionally done in RandNNs, 
A~ = 0, for all u = 1, . . . , N a . In a standard RandNN, the 
neuron's loads are computed solving the expressions (TJ, Q 
and (|3j. More precisely, input neurons behave as a M/M/l 
queues. The load or activity rate of neuron u, u = 1, . . . , N a 
is, in the stable case (a u < r u ), simply g u — a u /r u . For 
reservoir units, the loads are computed solving the non-linear 
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system composed of equations ([TJ, (|2]) and The network 
is stable if all obtained loads are < 1. 

In our ESQN model, we do the same for input neurons, but 
for the reservoir, we introduce the concept of state. The state 
is simply the vector of loads g. When we need the network 
output corresponding to a new input a, we first compute a 
new state by using 



Qu 



v—l 



v=N a + l 



Qv W uv 



r 

v — l 



E SvW~ v 
v=N a .+l 



(6) 



for all u € [N a +1, N a +N x }. When this is seen as a dynamical 
system, on the left we have the loads at t, and on the r.h.s. 
the loads at t — 1. 

The readout part is computed by a parametric func- 
tion g(w out , a, g) (or <?(w out , a(i), g(t)) when this is used as 
a dynamical prediction system. In this paper we present the 
simple case of computing the readout using a linear regression. 
It is easy to change this by another type of function g, due to 
the independent structure between the reservoir and readout 
layers. Thus, the network output y(t) = [yi(t), . . . , VN b (t)] 
is computed for any m € [l,iVb] usm g expression |5]) and it 
can be written as follows, where we use the temporal version 
at time t: 



Vm(t) 



J m0 



E<^w 



E 

=l+iV a 



\Qi(t). (7) 



The output weights w out can be computed using some of the 
traditional algorithms to solve regressions such as the "ridge 
regression" or the least mean square algorithms [27|. 



V. Experimental Results 

In our numerical experiences, we consider a simulated time 
series data widely used in the ESN literature flO) , [28] and 
two real world data sets about Internet traffic, used in research 
work about forecasting techniques J29| , [30) . To evaluate the 
models' accuracy, we use the Normalized Mean Square Error 
(NMSE): 



NMSE = 



y 



(k) 

3 

(fc) 
j 



(8) 



where bj is the empirical mean, and where we use the same 
notation as before for the data. The positive and negative 
weights of the ESQN model and the initial reservoir state 
were randomly initialized in the intervals [0, 0.2] and [0, 1], 
respectively. As usual, the training performance can depend 
on the choice of the starting weights. To take this into account, 
we experiment with 20 different random initial weights and 
we calculate their average performance. The preprocessing 
data step consisted in rescaling the data in the interval [0,1]. 
The learning method used was offline ridge regression pT[ . 
This algorithm contains a regularization parameter which is 



Table I 

Comparison between ESN and ESQN. We give the NMSE 

OBTAINED FROM 20 INDEPENDENT TRIALS, AND THE CORRESPONDING 

Confidence Interval (CI), for an ESN model and the proposed 

ESQN PROCEDURE. THE RESERVOIR SIZE WAS 80 UNITS FOR THE FIRST 
DATA SET AND 40 UNITS FOR THE OTHER ONES. 



Series 


Model 


NMSE 


CI 


NARMA 


ESN 
ESQN 


0.1401 
0.1004 


±0.0504 
±0.0025 


ISP 


ESN 
ESQN 


0.0062 
0.0100 


±9.8885 x 10- y 
±1.2436 x 10" 4 


UKERNA 


ESN 
ESQN 


0.3781 
0.2030 


±0.0066 
±0.0335 



adjusted for each data set. The time series data considered 
were: 

1) Fixed 10th order nonlinear autoregressive moving aver- 
age (NARMA) system. The series is generated by the 
following expression: 

b(t + 1) = 0.3 b(t) + 0.05 6(i)E*=o W ~ i) 
+ 1.5 s(t - 9)s(t) + 0.1, where s(t) ~ Unif [0,0.5]. 
We generated a training data with 1990 samples and a 
validation set with 390 samples. 

2) Traffic data from an Internet Service Provider (ISP) 
working in 11 European cities. The original data is in 
bits and was collected every 5 minutes. We rescaled 
it in [0,1]. The size of the training data is 9848 and 
the size of validation set is 4924. The input neurons 
(N a = 7) are mapped to the last 7 points of the past 
data, that is with values from t — 6 up to time t. This 
configuration was suggested in [29] where the authors 
discuss different neural network topologies taking into 
account seasonal traits of the data. 

3) Traffic data from United Kingdom Education and Re- 
search Networking Association (UKERNA). The Inter- 
net traffic was collected every day. The network input at 
any time t is the triple composed of the traffic at times 
t, t — 6 and t — 7, as studied in |29) . This small data 
set has 47 training pairs and 15 validation samples. 

The NARMA series data was studied in deep in fl2) , |21|, 
|28|, p2| , p3|. For the last two data sets the performance 
using NN, ARIMA and Holt-Winters methods can be seen 



in 1 29 1. A typical ESN model consists in a reservoir with the 
following characteristics: random topology, 7Y X large enough, 
sparsely connected (roughly between 15% and 20% of their 
weights are non-zeros) [8|. The specific ESN used has a 
sparsity of 15% and spectral radius of 0.95 in its reservoir 
matrix. In [12], the authors obtained the best performance in 
the NARMA data problem when the spectral radius was close 
to 1. 

The ESN performance can be improved using leaky- 
integrator neurons p6) , feedback connections (Sj or initial- 
izing the reservoir weights using another initializing crite- 
ria J33~| , [34] . Both models have 80 units in the reservoir for 
the NARMA data and 40 units for the other two data sets. In 
this paper, in order to compare the performance of the ESQN 
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Example of ESQN estimation of the 80 validation instances of Narma series data 



The ESQN model accuracy for different reservoir sizes for NARMA series data 




Figure 1. Example of ESQN prediction for 80 time steps of fixed 10th 
NARMA validation data set. The reservoir was randomly initialized and it 
had 80 units. 



Example of ESQN 




Figure 2. Example of ESQN prediction for 20 instances in the validation 
set of the European ISP traffic data. The instances correspond to time steps 
between 4900 and 4920. The reservoir was randomly initialized and it had 40 
neurons. 



Figure 3. The ESQN model performance for different reservoir sizes which 
are computed for 10th NARMA validation data set. The reservoir weights 
were randomly initialized. Figure shows the NMSE average achieved in 20 
runs with different ESQN initial weights. 



Example of ESQN estimation for the last 1 instances of the validation UKERNA data 




Figure 4. ESQN estimation for UKERNA validation data set. The reservoir 
weights were randomly initialized. The reservoir size is 40. 



and ESN models we use the standard versions of each of them. 

Table [I] presents the accuracy of the ESQN and ESN 
models. In the last column we give a 95% confidence interval 
obtained from 20 independent runs. We can see that for the 
10th order NARMA and UKERNA data the performance 
obtained with ESQN is better than with ESN (even if in the 
NARMA case, the confidence intervals have a "slight" non- 
empty intersection). In the case of the European ISP data, ESN 
shows a significant better performance. Observe that in all 
cases the accuracy obtained with ESQN was very good. Also 
observe that we are using some years of cumulated knowledge 
about ESNs in our implementation, which we are comparing 
with our first versions of our new largely unexplored ESQN 
model. 

Figure [3] shows that the reservoir size is an important 
parameter affecting the performance of the ESQN. This also 



happens with the ESN model: in general, a larger reservoir 
enriches the learning abilities of the model. The sparsity and 
density of the reservoir in the ESQN model was not studied in 
this work. It is left for future efforts. NARMA is an interesting 
time series data where the outputs depend on both the input 
and previous outputs. The modeling problem is difficult to 
solve due to the non-linearity of the data and the necessity 
of having some kind of long memory. Figure [T] illustrates 
an estimation of ESQN with 80 units in the reservoir which 
are randomly chosen in [0,0.2]. Figures [5] and |4] show the 
prediction values for an interval of validation data. Figure [2] 
shows the prediction of 20 instances beginning at time 4900 
of the validation set. The main difficulty to model UKERNA 
data (using day scale) is that the training set is small. In 
spite of this, Figure [4] illustrates the good performance of the 
ESQN model. This figure shows the estimation of the last 10 
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instances in the validation data. 

VI. Conclusions 

In this contribution, we have presented a new type of 
Reservoir Computing model which we call Echo State Queu- 
ing Network (ESQN). It combines ideas from queueing and 
neural networks. It is based on two computational models: the 
Echo State Network (ESN) and the Random Neural Network. 
Both methods have been successfully used in forecasting 
and machine learning problems. Particularly, ESNs have been 
applied in many temporal learning tasks. Our model was 
used to predict three time series data which are widely used 
in the machine learning literature. In all cases tested, the 
performance results have been very good. We empirically 
investigated the relation between the reservoir size and the 
ESQN performance. We found that the reservoir size has a 
significant impact on the accuracy. Another positive property 
of ESQNs is their simplicity, since reservoir units are just 
counter functions. Last, our tool is very easy to implement, 
both in software and in hardware. 

There are still several aspects of the model to be studied in 
future work. Some examples are the impact of the sparsity of 
the reservoir weights, the weight initialization methods used, 
the scaling of reservoir weights and the utilization of leaky 
integrators. 
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